2018 11 12

The dataset

The divorce data was acquired on November 11, 2018 from the official website of the Polish Central Statistical Office.

The dataset lists the number of divorces by cause for each year between 1999 and 2017.

Two statistics are provided for each cause:

  • one for the number of divorces where the cause is exclusive,
  • and one for the number of divorces, where the cause is non-exclusive (i.e. is listed among others causes).

Cleaning data

The original dataset needed several transformations before it was eligible for plotting.

Following libraries were used.

#load libraries
library(stringr)
library(reshape)
library(data.table)
library(plotly)
library(dplyr)

Final dataset

Below is the preview of a sample of the clean version of the dataset, eligible for plotting.

sample_n(mdata,10)
##                                  cause year among_others exclusively
## 34  bad attitude toward family members 2013         9856        1084
## 204                  prolonged absence 2012         5445        1334
## 221                 sexual discordance 2010          882         175
## 201                  prolonged absence 2009          390        1619
## 42                  discordant beliefs 2002           NA          10
## 52                  discordant beliefs 2012         4017         197
## 196                  prolonged absence 2004            0         198
## 219                 sexual discordance 2008           83         173
## 118               housing difficulties 2002          512         141
## 144          incoherence of characters 2009         5404       16499

Interpretability of the type of causes

The interpretability of exclusive and non-exclusive causes is compromised by the lack of apparent consistency.

It is not clear how to interpret the non-exclusive causes. No information is available on whether the non-exclusive causes refer to all divorce instances where a particular cause was at play, regardless of whether exclusive or not, or if they only include instances where the cause was non-exclusive.

For some years all types of exclusive and non-exclusive causes sum up to a total number of yearly divorces equal to the actual number indicated elswere on the website.

For other years respective sums exceed the yearly number of divorces, suggesting that the methodologies for creating indices might have differed on different years.

Compromised interpretability

Therefore we will treat both types of causes independently and won't attempt to demonstrate the causes relative to the total yearly number of divorces, as originally intended.

We will demonstrate counts rather than percentages instead.

The number of divorces by cause (exclusive)

x <- list(title = "Year")
y <- list(title = "Number of divorces by cause")
plot_ly(mdata,x=~year,y=~exclusively, color = ~cause,
        linetype = ~cause, type = "scatter", mode = "lines")
        %>% layout(title='Divorce in Poland by cause (exclusive), 
                   1999-2017',
        xaxis = x, yaxis = y)

Plot: Divorces by cause (exclusive)

The number of divorces by cause (non-exclusive)

x <- list(title = "Year")
y <- list(title = "Number of divorces by cause")
plot_ly(mdata,x=~year,y=~among_others, color = ~cause,
        linetype = ~cause, type = "scatter", mode = "lines")
        %>% layout(title='Divorce in Poland by cause (non-exclusive), 
                   1999-2017',
        xaxis = x, yaxis = y)

Plot: Divorces by cause (non-exclusive)

Conclusions

Among the exclusive causes of divorce in Poland, their order appears to be consistent throughout the years, with the incoherence of characters being by far the leading official cause for divorce.

It is followed by infidelity and alcohol abuse.

Among the non-exclusive causes of divirce, their order has been changing, with the incoherence of characters being the leading non-exclusive cause in the recend decade.

Alcohol abuse and infdelity rank on the second and third place in the recent couple of years. Up untill 2009 they were the leading non-exclusive causes, much more common than the now trending incoherence of characters.